Build tent by Dao007forever · Pull Request #2089 · kvcache-ai/Mooncake

Dao007forever · 2026-05-13T02:21:59Z

Description

Fixes needed to build and run TENT without USE_REDIS, plus NVLink transport robustness fixes.

fabric_allocator.cmake: Add POST_BUILD so the fabric allocator's custom build script runs after the target is built, not before.
tent/CMakeLists.txt: Gate tests/ behind BUILD_UNIT_TESTS so TENT can build with unit tests disabled.
tent/src/runtime/transfer_engine_impl.cpp: Drop the tent/metastore/redis.h include and replace the REDIS_*_DB_INDEX macros with local constexpr constants — the header isn't compiled when USE_REDIS=OFF, but the DB-index validation is still wanted.
tent/src/transport/nvlink/nvlink_transport.cpp:
- Save/restore the current CUDA device around IPC operations (cudaIpcGetMemHandle, cuMemGetAddressRange) so they run on the device the buffer was allocated on, then restore the caller's device.
- Detect driver-allocated (VMM / cuMemCreate) pointers via cuMemRetainAllocationHandle and skip CUDA IPC export for them, since cudaIpcGetMemHandle only supports cudaMalloc-backed memory.
- Log a descriptive error (addr, base, device, CUDA error string) when cudaIpcGetMemHandle fails, instead of just propagating the macro failure.

W20260512 17:32:07.779717 276846674057088 transfer_engine_impl.cpp:684] InternalError: cudaIpcGetMemHandle(&handle, (void*)base_ptr): invalid argument
    Raised at /home/inf-daole/Mooncake-dao/mooncake-transfer-engine/tent/src/transport/nvlink/nvlink_transport.cpp:263

Module

Type of Change

How Has This Been Tested?

Built TENT locally with USE_REDIS=OFF, BUILD_UNIT_TESTS=OFF, USE_CUDA=ON, USE_MNNVL=ON via `scripts/build_local_cuda_tent.sh`. Exercised the NVLink transport against PyTorch-allocated tensors (caching allocator sub-allocations) and against driver-allocated VMM buffers to confirm both paths are handled.

Checklist

I have performed a self-review of my own code.
I have formatted my own code using `./scripts/code_format.sh` before submitting.
I have updated the documentation.
I have added tests to prove my changes are effective.

gemini-code-assist

Code Review

This pull request enhances the build system and CUDA transport logic by introducing a local CUDA build script, making unit test compilation optional, and improving the NVLink transport layer with CUDA device context management and VMM allocation support. Feedback from the review focuses on improving script portability by removing hardcoded user paths, addressing a security vulnerability in LD_LIBRARY_PATH construction, and ensuring robust error handling for CUDA driver API calls.

alogfans · 2026-05-27T02:45:41Z

+namespace {
+constexpr uint8_t kRedisMaxDbIndex = 255;
+constexpr uint8_t kRedisDefaultDbIndex = 0;
+}
+


Why add these constants instead of reusing REDIS_DEFAULT_DB_INDEX in elseware?

We are building without USE_REDIS?

staryxchen

LGTM but need to check CI status

Dao007forever · 2026-05-28T04:34:51Z

CI failing with /usr/bin/ld: final link failed: No space left on device, is the node full?

staryxchen · 2026-05-28T04:58:00Z

CI failing with /usr/bin/ld: final link failed: No space left on device, is the node full?

You can push an empty commit to trigger CI again

codecov-commenter · 2026-05-28T06:02:56Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

* Build with TENT * Fix TENT failed start * Revert * Format * Empty

Dao007forever added 2 commits May 12, 2026 16:39

Build with TENT

a5ca257

Fix TENT failed start

bd880e3

Dao007forever requested review from alogfans, chestnut-Q and doujiang24 as code owners May 13, 2026 02:22

github-actions Bot added run-ci Transfer Engine labels May 13, 2026

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

Comment thread mooncake-transfer-engine/tent/src/transport/nvlink/nvlink_transport.cpp

Comment thread scripts/build_local_cuda_tent.sh Outdated

Comment thread scripts/build_wheel.sh Outdated

Comment thread scripts/build_wheel.sh Outdated

stmatengss assigned alogfans May 26, 2026

alogfans reviewed May 27, 2026

View reviewed changes

Comment thread scripts/build_local_cuda_tent.sh Outdated

Revert

ac78d7c

Dao007forever requested review from 00fish0, dtcccc and staryxchen as code owners May 27, 2026 07:14

Dao007forever added 2 commits May 27, 2026 00:14

Merge branch 'main' into build_tent

f8d1976

Format

8b154be

staryxchen approved these changes May 28, 2026

View reviewed changes

Empty

08ba7f5

staryxchen merged commit 4569ce7 into kvcache-ai:main May 28, 2026
20 checks passed

A-Liuhao pushed a commit to A-Liuhao/Mooncake that referenced this pull request Jun 25, 2026

Build tent (kvcache-ai#2089)

751fbe9

* Build with TENT * Fix TENT failed start * Revert * Format * Empty

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build tent#2089

Build tent#2089
staryxchen merged 6 commits into
kvcache-ai:mainfrom
Dao007forever:build_tent

Dao007forever commented May 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alogfans May 27, 2026

Uh oh!

Dao007forever May 27, 2026

Uh oh!

Uh oh!

staryxchen left a comment •

edited

Loading

Uh oh!

Dao007forever commented May 28, 2026

Uh oh!

staryxchen commented May 28, 2026

Uh oh!

codecov-commenter commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Dao007forever commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Module

Type of Change

How Has This Been Tested?

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alogfans May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Dao007forever May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

staryxchen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dao007forever commented May 28, 2026

Uh oh!

staryxchen commented May 28, 2026

Uh oh!

codecov-commenter commented May 28, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Dao007forever commented May 13, 2026 •

edited

Loading

staryxchen left a comment •

edited

Loading